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Abstract 

A  general,  reusable  computational  resource  has  been  de¬ 
veloped  within  the  Penman  text  generation  project  for 
organizing  domain  knowledge  appropriately  for  linguis¬ 
tic  realization.  This  resource,  called  the  upper  model, 
provides  a  domain-  and  task-independent  classification 
system  that  supports  sophisticated  natural  language 
processing  while  significantly  simplifying  the  interface 
between  domain-specihc  knowledge  and  general  linguis¬ 
tic  resources.  This  paper  presents  the  results  of  our  ex¬ 
periences  in  designing  and  using  the  upper  model  in  a 
variety  of  applications  over  the  past  5  years.  In  par¬ 
ticular,  we  present  our  conclusions  concerning  the  ap¬ 
propriate  organization  of  an  upper  model,  its  domain- 
independence,  and  the  types  of  interrelationships  that 
need  to  be  supported  between  upper  model  and  gram¬ 
mar  and  semantics. 

Introduction:  interfacing  with  a  text 
generation  system 

Consider  the  task  of  interfacing  a  domain-independent, 
reusable,  general  text  generation  system  with  a  partic¬ 
ular  application  domain,  in  order  to  allow  that  appli¬ 
cation  to  express  system-internal  information  in  one  or 
more  natural  languages.  Interned  information  needs  to 
be  related  to  strategies  for  expressing  it.  This  could 
be  done  in  a  domain-specific  way  by  coding  how  the 
application  domain  requires  its  information  to  appear. 
This  is  clearly  problematic,  however:  it  requires  de¬ 
tailed  knowledge  on  the  part  of  the  system  builder  both 
of  how  the  generator  controls  its  output  forms  and  the 
kinds  of  information  that  the  application  domain  con¬ 
tains.  A  more  general  solution  to  the  interfacing  prob¬ 
lem  is  thus  desirable. 

We  have  found  that  the  definition  of  a  mapping  be¬ 
tween  knowledge  and  its  linguistic  expression  is  facil¬ 
itated  if  it  is  possible  to  claissify  any  particular  in¬ 
stances  of  facts,  states  of  affairs,  situations,  etc.  that 
occur  in  terms  of  a  set  of  general  objects  and  re¬ 
lations  of  specified  types  that  behave  systematically 
with  respect  to  their  possible  linguistic  realizations. 
This  approach  has  been  followed  within  the  PENMAN 


text  generation  system  [Mann  and  Matthiessen,  1985; 
The  Penman  Project,  1989]  where,  over  the  past  5 
years,  we  have  been  developing  and  using  an  extensive, 
domain-  and  task-independent  organization  of  knowl¬ 
edge  that  supports  natural  language  generation:  this 
level  of  organization  is  called  the  upper  model  [Bate¬ 
man  et  ai,  1990;  Mann,  1985;  Moore  and  Arens,  1985]. 

The  majority  of  natural  language  processing  systems 
currently  planned  or  under  development  are  now  recog¬ 
nizing  the  necessity  of  some  level  of  abstract  ‘semantic’ 
organization  similar  to  the  upper  model  that  classifies 
knowledge  so  that  it  may  be  more  readily  expressed 
linguistically.^  However,  they  mostly  suffer  from  either 
a  lack  of  theoretical  constraint  concerning  their  internal 
contents  and  organization  and  the  necessary  mappings 
between  them  and  surface  realization,  or  a  lack  of  ai^ 
straction  which  binds  them  too  closely  with  linguistic 
form.  It  is  important  both  that  the  contents  of  such 
a  level  of  abstraction  be  motivated  on  good  theoretical 
grounds  and  that  the  mapping  between  that  level  and 
linguistic  form  is  specifiable. 

Our  extensive  experiences  with  the  implementation 
and  use  of  a  level  of  semantic  organization  of  this  kind 
within  the  penman  system  now  permit  us  to  state  some 
clear  design  criteria  and  a  well-developed  set  of  neces¬ 
sary  functionalities. 

The  Upper  Model’s  Contribution  to  the 
Solution  to  the  Interface  Problem: 
Domain  independence  and  reusability 
The  upper  model  decomposes  the  mapping  problem  by 
establishing  a  level  of  linguistically  motivated  knowl¬ 
edge  organization  specifically  constructed  as  a  reponse 

’Including,  for  example:  the  Functional  Sentence  Struc¬ 
ture  of  xtra:  [Allgayer  et  al.,  1989];  [Chen  and  Cha, 
1988];  [Dahlgren  et  al.,  1989];  polygloss:  [Emele  et  al., 
1990];  certain  of  the  Domain  and  Text  Structure  Objects 
of  spokesman:  [Meteer,  1989];  translator:  [Nirenberg  et 
at.,  1987];  the  Semantic  Relations  of  eurotra-d:  [Steiner 
et  al.,  1987];  janus:  [Weischedel,  1989].  Space  naturally 
precludes  detailed  comparisons  here:  see  [Bateman,  1990] 
for  further  discussion. 
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to  the  task  of  constraining  linguistic  realizations^;  gen¬ 
erally  we  refer  to  this  level  of  organization  as  mean¬ 
ing  rather  than  as  knowledge  in  order  to  distinguish 
it  from  language-independent  knowledge  and  to  em¬ 
phasize  its  tight  connection  with  linguistic  forms  (cf. 
[Matthiessen,  1987:259-260]).  While  it  may  not  be  rea¬ 
sonable  to  insist  that  application  domains  organize  their 
knowledge  in  terms  that  respect  linguistic  realizations 
—  as  this  may  not  provide  suitable  organizations  for, 
e.g.,  domain-internal  reasoning  —  we  have  found  that 
it  is  reeisonable,  indeed  essential,  that  domain  knowl¬ 
edge  be  so  organized  if  it  is  also  to  support  expression 
in  natural  language  relying  on  general  natural  language 
processing  capabilities. 

The  general  types  constructed  within  the  upper 
model  necessarily  respect  generalizations  concerning 
how  distinct  semantic  types  can  be  realized.  We  then 
achieve  the  necessary  link  between  particular  domain 
knowledge  and  the  upper  model  by  having  an  appli¬ 
cation  classify  its  knowledge  organization  in  terms  of 
the  general  semantic  categories  that  the  upper  model 
provides.  This  does  not  require  any  expertise  in  gram¬ 
mar  or  in  the  mapping  between  upper  model  and  gram¬ 
mar.  An  application  needs  only  to  concern  itself  with 
the  ‘meaning’  of  its  own  knowledge,  and  not  with  fine 
details  of  linguistic  form.  This  classification  functions 
solely  as  an  interface  between  domain  knowledge  and 
upper  model;  it  does  not  interfere  with  domain-internal 
organization.  The  text  generation  system  is  then  re¬ 
sponsible  for  realizing  the  semantic  types  of  the  level 
of  meaning  with  appropriate  grammatical  forms.®  Fur¬ 
ther,  when  this  classification  has  been  established  for 
a  given  application,  application  concepts  can  be  used 
freely  in  input  specifications  since  their  possiblities  for 
linguistic  realization  are  then  known.  This  supports 
two  significant  functionalities: 

•  interfacing  with  a  natural  language  system  is  radi¬ 
cally  simplified  since  much  of  the  information  spe¬ 
cific  to  language  processing  is  factored  out  of  the  in¬ 
put  specifications  required  and  into  the  relationship 
between  upper  model  and  linguistic  resources; 

•  the  need  for  domain-specific  linguistic  processing 
rules  is  greatly  reduced  since  the  upper  model  pro¬ 
vides  a  domain-independent,  general  and  reusable 
conceptual  organization  that  may  be  used  to  classify 
all  domain-specific  knowledge  when  linguistic  pro¬ 
cessing  is  to  be  performed. 

^Although  my  discussion  here  is  oriented  towards  text 
generation,  our  current  research  aims  at  fully  bi-directional 
linguistic  resources  [Kasper,  1988;  Kasper,  1989];  the  map¬ 
ping  is  therefore  to  be  understood  as  a  bi-directional  map¬ 
ping  throughout. 

^This  is  handled  in  the  penman  system  by  the  grammar’s 
inquiry  semantics,  which  has  been  described  and  illustrated 
extensively  elsewhere  (e.g.,  [Bateman,  1988;  Mann,  1983; 
Matthiessen,  1988]). 


An  example  of  the  simplification  that  use  of  the  upper 
model  offers  for  a  text  generation  system  interface  lan¬ 
guage  can  be  seen  by  contrasting  the  input  specification 
required  for  a  generator  such  as  mumble-86  [Meteer  ei 
ai,  1987]  —  which  employs  realization  classes  consid¬ 
erably  less  abstract  than  those  provided  by  the  upper 
model  —  with  the  input  required  for  Penman.^  Fig¬ 
ure  1  shows  corresponding  inputs  for  the  generation  of 
the  simple  clause:  Fluffy  is  chasing  little  mice.  The  ap¬ 
propriate  classification  of  domain  knowledge  concepts 
such  as  chase,  cat,  mouse,  and  little  in  terms  of  the 
general  semantic  types  of  the  upper  model  (in  this  case, 
directed-action,  object,  object,  and  size  respectively  — 
for  definitions  see:  [Bateman  et  al.,  1990])  automatically 
provides  information  about  syntactic  realization  that 
needs  to  be  explicitly  stated  in  the  MUMBLE-86  input 
(e.g.,  S-V-0_tBO-explicit-args,  np-common-noun, 
restrictive-modifier,  adjective).  Thus,  for  ex¬ 
ample,  the  classification  of  a  concept  mouse  as  an  ob¬ 
ject  in  the  upper  model  is  sufficient  for  the  grammar 
to  consider  a  realization  such  aws,  in  MUMBLE-86  terms, 
a  general-np  with  a  particular  np-common-noun  and 
accessories  of  gender  neuter.  Similarly,  the  classi¬ 
fication  of  chase  as  a  directed-action  opens  up  linguis¬ 
tic  realization  possibilities  including  clauses  with  a  cer¬ 
tain  class  of  transitive  verbs  and  characteristic  possi¬ 
bilities  for  participants,  corresponding  nominalizations, 
etc.  Such  low-level  syntactic  information  is  redundant 
for  the  PENMAN  input. 

The  further  domain-independence  of  the  upper  model 
is  shown  in  the  following  example  of  text  generation 
control.  Consider  two  rather  different  domains:  a  navy 
database  of  ships  and  an  expert  system  for  digital  cir¬ 
cuit  diagnosis.®  The  navy  data  base  contains  informa¬ 
tion  concerning  ships,  submarines,  ports,  geographical 
regions,  etc.  and  the  kinds  of  activities  that  ships,  sub¬ 
marines,  etc.  can  take  part  in.  The  digital  circuit  di¬ 
agnosis  expert  system  contains  information  about  sub¬ 
components  of  digital  circuits,  the  kinds  of  connections 
between  those  subcomponents,  their  possible  functions, 
etc.  A  typical  sentence  from  each  domain  might  be: 

circuit  domain:  The  faulty  system  is  connected  to 

the  input 

navy  domain:  The  ship  which  was  inoperative  is 

sailing  to  Sasebo 

The  input  specifications  for  both  of  these  sentences 
are  shown  in  Figure  2.  These  specifications  freely  in¬ 
termix  upper  model  roles  and  concepts  (e.g.,  domain, 

*Note  that  this  is  not  intended  to  single  out  mumble-86: 
the  problem  is  quite  general;  cf.  unification-based  frame¬ 
work  such  as  [McKeown  and  Paris,  1987],  or  the  Lexi¬ 
cal  Functional  Grammar  (LFG)-based  approach  of  [Momma 
and  Dorre,  1987].  As  mentioned  above,  the  current  devel¬ 
opments  within  most  such  approaches  are  now  considering 
extensions  similar  to  that  covered  by  the  upper  model. 

®These  are,  in  fact,  two  domains  with  which  we  have  had 
experience  generating  texts  using  the  upper  model. 
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(general-clause 

: head  (CHASES/S-V-0_two-eiplicit-arg3 
(general-np 

:head  (np-proper-name  "Fluffy") 

: accessories  (: number  singular 
: gender  masculine 
; person  third 
: determiner-policy 

no-determiner) ) 

(generad-np 

:head  (np-common-noun  "mouse") 

: accessories  (: number  plural 
: gender  neuter 
: person  third 
: deteiminer-policy 
initially-indefinite) 

: further-specifications 
( ( : attachment-function 

restrictive-modifier 
: specification 

(predication-to-be  ♦self* 
(adjective  "little"))) 

>>  ) 

: accessories  (: tense-modal  present  : progressive 
: unmarked)  ) 

Input  to  mumblb-86  for  the  clause: 

Fluffy  ts  chasing  little  mice 

from:  Meteer,  McDonald,  Anderson,  Forster,  Gay, 
Huettner,  and  Sibun  (1987) 

(e  /  chase 

: actor  (e  /  cat  :name  Fluffy) 

:actee  (m  /  mouse 

:  size-auscription  (s  /  little) 
:multiplicity-q  multiple 
:  singularity-q  nonsingulair) 

: tense  present-progressive) 

Corresponding  input  to  penman 

Figure  1:  Comparison  of  input  requirements  for 
mumble-86  and  penman 


range,  •property- ascription)  and  the  respective  domain 
roles  and  concepts  (e.g.,  system,  faulty,  input,  destina¬ 
tion,  sail,  ship,  inoperative).  Both  forms  are  rendered 
interpretable  by  the  subordination  of  the  domain  con¬ 
cepts  to  the  single  generalized  hierarchy  of  the  upper 
model.  This  is  illustrated  graphically  in  Figure  3.  Here 
we  see  the  single  hierarchy  of  the  upper  model  being 
used  to  subordinate  concepts  from  the  two  domains. 
The  domain  concept  system,  for  example,  is  subordi¬ 
nated  to  the  upper  model  concept  object,  domain  con¬ 
cept  inoperative  to  upper  model  concept  quality,  etc. 
By  virtue  of  these  subordinations,  the  grammar  and  se¬ 
mantics  of  the  generator  can  interpret  the  input  speci¬ 
fications  in  order  to  produce  appropriate  hnguistic  re¬ 
alizations:  the  upper  model  concept  object  licenses  a 
particular  set  of  reaUzations,  as  do  the  concepts  qual¬ 
ity,  material-process,  etc. 

Our  present  upper  model  contains  approximately  200 


(vl  /  connects 

: domain  (v2  /  system 

'.relations 

(v3  /  property-ascription 
: domain  v2 

:reaige  (v4  /  faulty))) 
: range  (v5  /  input) 

: tense  present) 

Input  for  digital  circuit  example  sentence: 

The  faulty  system  is  connected  to  the  input 


(vl  /  sail 

: actor  (v2  /  ship 

: relations 

(v3  /  property-ascription 
: domain  v2 

: range  (v4  /  inoperative) 

: tense  past) 

: destination  (sasebo  /  port) 

: tense  present-progressive) 

Input  for  navy  example  sentence: 

The  ship  which  was  inoperative  is  sailing  to  Sasebo 

Figure  2;  Input  specifications  from  navy  and  digital 
circuit  domains 


such  categories,  as  motivated  by  the  requirements  of  the 
grammar,  and  is  organized  as  a  structured  inheritance 
lattice  represented  in  the  LOOM  knowledge  representa¬ 
tion  language  [MacGregor  and  Bates,  1987].  Generally, 
the  upper  model  represents  the  speaker’s  experience  in 
terms  of  generalized  linguistically-motivated  ‘ontolog¬ 
ical’  categories.  More  specifically,  the  following  infor¬ 
mation  is  required  (with  example  categories  drawn  from 
the  current  PENMAN  upper  model): 

•  abstract  specifications  of  process-type/relations  and 

configurations  of  participants  and  circumstances 
(e.g.,  NONDIRECTED- 

ACTION,  ADDRESSEE-ORIENTED-VERBAL-PROCESS, 
ACTOR,  SENSER,  RECIPIENT,  SPATIO-TEMPORAL, 
CAUSAL-RELATION,  GENERALIZED-MEANS), 

•  abstract  specifications  of  object  types,  for,  e.g.,  se¬ 
mantic  selection  restrictions  (eg-, 

DECOMPOSABLE-  OBJECT,  ABSTRACTION,  PERSON, 

spatial-temporal), 

•  abstract  specifications  of  quality  types,  and  the  types 
of  entities  which  they  may  relate  (e.g.,  behavioral- 
QUALITY,  SENSE-AND-MEASURE-qUALITY,  STATUS- 
QUALITY), 

•  abstract  specifications  of  combinations  of  events  (e.g., 

DISJUNCTION,  EXEMPLIFICATION,  RESTATEMENT). 

These  are  described  in  full  in  [Bateman  et  ai,  1990]. 

Appropriate  linguistic  realizations  are  not  in  a  one- 
to-one  correspondence  with  upper  model  concepts,  how¬ 
ever.  The  relationship  needs  to  be  rather  more  complex 
and  so  the  question  of  justification  of  upper  model  con¬ 
cepts  and  organization  becomes  crucial. 
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UPPER  MODEL 


domain  MODELS 


Figure  3:  Upper  model  organization  reuse  with  differing 
domains 


Degree  of  Abstraction  vs.  Linguistic 
Responsibility 

The  general  semantic  types  defined  by  a  level  of  mean* 
ing  such  as  the  upper  model  need  to  be  ‘linguisti¬ 
cally  responsible’,  in  that  mappings  between  them  and 
linguistic  form  may  be  constructed.  In  addition,  to 
be  usable  by  an  application,  they  must  also  be  suf¬ 
ficiently  operationalizable  so  as  to  support  consistent 
coding  of  application  knowledge.  Both  of  these  require¬ 
ments  have  tended  to  push  the  level  of  organization 
defined  closer  towards  linguistic  form.  However,  it  is 
also  crucial  for  this  organization  to  be  sufficiently  ab¬ 
stract,  i.e.,  removed  from  linguistic  form,  so  that  it  is 
possible  for  an  application  to  achieve  its  classification 
purely  on  grounds  of  meaning.  It  is  thus  inadequate 
to  rely  on  formroriented  criteria  for  upper  model  con¬ 
struction  because  grammatical  classifications  are  often 
non-isomorphic  to  semantic  classifications:  they  there¬ 
fore  need  to  deviate  from  semantic  organization  in  or¬ 
der  to  respect  the  syntactic  criteria  that  dehne  them. 
Reliance  on  details  of  linguistic  realization  also  compro¬ 
mises  the  design  aim  that  the  applications  should  not 
be  burdened  with  grammatical  knowledge.® 

®This  is  also  resonant  with  the  design  aim  in  text  gen¬ 
eration  that  higher  level  processes  —  e.g.,  text  planners  — 
should  not  need  direct  access  to  low  level  information  such  as 
the  grammar  [Hovy  et  al.,  1988].  For  descriptions  of  all  these 


Thus,  the  level  of  abstraction  of  an  upper  model  must 
be  sufficiently  high  that  it  generalizes  across  syntac¬ 
tic  alternations,  without  being  so  high  that  the  map¬ 
ping  between  it  and  surface  form  is  impossible  to  state. 
This  tension  between  the  requirements  of  abstractness 
and  linguistic  responsibility  presents  perhaps  the  ma¬ 
jor  point  of  general  theoretical  difficulty  and  interest 
for  future  developments  of  upper  model-like  levels  of 
meaning.  Without  a  resolution,  substantive  progress 
that  goes  beyond  revisions  of  what  the  penman  up¬ 
per  model  already  contains  is  unlikely  to  be  achieved. 
It  is  essential  for  constraints  to  be  found  for  what  an 
upper  model  should  contain  and  how  it  should  be  orga¬ 
nized  so  that  an  appropriate  level  of  abstraction  may  be 
constructed. 

Constraining  the  Organization  of  an 
Upper  Model 

Figure  4  sets  several  methodologies  have  been  pursued 
for  uncovering  the  organization'  and  contents  of  a  level 
of  meaning  such  as  an  upper  model,  with  examples  of 
approaches  that  have  adopted  them,  along  the  contin¬ 
uum  of  abstraction  from  linguistic  form  to  abstr2u:t  on¬ 
tology.  While  the  problem  of  being  too  bound  to  lin¬ 
guistic  form  has  been  mentioned,  there  are  also  severe 
problems  with  attempts  to  construct  an  upper  model 
independent  of  form  and  motivated  by  other  criteria, 
e.g.,  a  logical  theory  of  the  organization  of  knowledge 
per  se.  Without  a  strong  theoretical  connection  to  the 
linguistic  system  the  criteria  for  organizing  an  abstrac¬ 
tion  hierarchy  remain  ill-speciiied;  there  is  very  little 
^arantee  that  such  systems  will  organize  themselves 
in  a  way  appropriate  for  interfacing  well  with  the  lin¬ 
guistic  system.’^ 

An  alternative  route  is  offered  by  the  approaches 
in  the  middle  of  the  continuum,  i.e.,  those  which  ab¬ 
stract  beyond  linguistic  form  but  which  still  maintain 
a  commitment  to  language  as  a  motivating  force.  This 
is  further  strengthened  by  the  notion,  now  resurgent 
within  current  linguistics,  that  the  organization  of  lan¬ 
guage  informs  us  about  the  organization  of  ‘knowl¬ 
edge’  (e.g.,  [Halliday,  1978;  Jackendoff,  1983;  Lan- 
gacker,  1987;  Matthiessen,  1987;  Talmy,  1987]):  that  is, 
the  relation  between  grammar  and  semantics /meaning 
is  not  arbitrary.  Detailed  theories  of  grammar  can  then 
be  expected  to  provide  us  with  insights  concerning  the 
organization  that  is  required  for  the  level  of  meaning. 

We  have  found  that  the  range  of  meanings  required  to 
support  one  particular  generalized  functional  region  of 

distinctions  in  detail,  see  the  penman  documentation  [The 
Penman  Project,  1989]. 

^Furthermore,  the  experience  of  the  janus  project  (e.g., 
[Weischedel,  1989])  has  been  that  the  cost  of  using  a  suffi¬ 
ciently  rich  logic  to  permit  axiomatization  of  the  complex 
phenomenon  required  is  very  high,  motivating  augmentation 
by  an  abstraction  hierarchy  very  similar  to  that  of  the  upper 
model  and  facing  the  same  problem  of  definitional  criteria. 
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Weischedel  (1989) 
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Figure  4:  Sources  of  motivations  for  upper  model  development 


the  grammar  developed  within  the  PENMAN  system  pro¬ 
vides  a  powerful  set  of  organizing  constraints  concern¬ 
ing  what  an  upper  model  should  contain.  It  provides 
for  the  representation  of ‘conceptual’  meanings  at  a  high 
level  of  abstraction  while  still  maintaining  a  mapping  to 
linguistic  form.  This  functional  region  corresponds  with 
the  Systemic  Functional  Linguistic  notion  of  the  expe¬ 
riential  metafunction  [Matthiessen,  1987],  one  of  four 
generahzed  meaning  types  which  are  simultanously  and 
necessarily  made  whenever  language  is  used.  Any  sen¬ 
tence  must  contain  contributions  to  its  function  from  all 
four  ‘metafunctions’  —  each  metafunction  providing  a 
distinct  type  of  constraint.  The  value  of  this  factoriza¬ 
tion  of  distinct  meaning  types  as  far  as  the  design  of 
an  upper  model  is  concerned  can  best  be  seen  by  ex¬ 
amining  briefly  what  it  excludes  from  consideration  for 
inclusion  within  an  upper  model:  i.e.,  all  information 
that  is  controlled  by  the  remaining  three  metafunctions 
should  not  be  represented. 

The  logical  metafunction  is  responsible  for  the  con¬ 
struction  of  composite  semantic  entities  using  the  re¬ 
sources  of  interdependency;  it  is  manifested  in  grammar 
by  dependency  relationships  such  as  those  that  hold  be¬ 
tween  the  head  of  a  phrase  and  its  dependents  and  the 
association  of  concepts  to  be  expressed  with  particu¬ 
lar  heads  in  the  sentence  structure.  The  removal  of 
this  kind  of  information  permits  upper  model  specifi¬ 
cations  to  be  independent  of  grammatical  constituents 
and  grammatical  dominance  relations. 

This  relaxes,  for  example,  the  mapping  between  ob¬ 
jects  and  processes  at  the  upper  model  level  and  nomi- 
nals  and  verbals  at  the  grammatical  level,  enabling  gen¬ 
eralizations  to  be  captured  concerning  the  existence  of 
verbal  participants  in  nominalizations,  and  permits  the 
largely  textual  variations  shown  in  (1)  and  (2)®  to  be 
removed  from  the  upper  model  coding. 

(1)  It  will  probably  rain  tomorrow 

It  is  likely  that  it  will  rain  tomorrow 

^Example  taken  from  [Meteer,  1988]. 


There  is  a  high  probability  that  it  will  rain  tomorrow 

(2)  independently 

in  a  way  that  is  independent 

No  change  in  upper  model  representation  or  classifica¬ 
tion  is  required  to  represent  these  variations. 

This  can  be  seen  more  specifically  by  considering  the 
following  PENMAN  input  specification  that  uses  only  up¬ 
per  model  terms: 

((cO  /  cause-effect 

: domain  discharge 
:  range  breeikdosn) 

(discharge  /  directed-action 

:actee  (electricity  /  substjince)) 

(breakdown  /  nondirected-action 
: actor  (system  /  object))) 

This  states  that  there  are  two  configurations  of  pro¬ 
cesses  and  participants  —  one  classified  as  an  upper 
model  directed-action,  the  other  as  a  nondirected-action 
—  which  are  related  by  the  upper  model  relationship 
cause-effect.  Now,  the  etssignment  of  concepts  to  differ¬ 
ently  ‘ranked’  heads  in  the  grammar  governs  reaUzation 
variants  including  the  following: 

Electricity  being  discharged  resulted  in  the  system 
breaking  down. 

Because  electricity  was  discharged,  the  system  broke 
down. 

Because  of  electricity  being  discharged  the  system  broke 
down. 

. . .  the  breakdown  of  the  system  due  to  an  electrical 
discharge. . . 

Electricity  was  discharged  causing  the  system  to  break 
down. 

...  on  electrical  discharge  causing  the  breakdown  of  the 

system. . . 

etc. 

Many  such  ‘paraphrase’  issues  are  currently  of  concern 
within  the  text  generation  community  (e.g.,  [Meteer, 
1988;  lordanskaja  et  al.,  1988;  Bateman  and  Paris, 
1989;  Bateman,  1989]). 

The  textual  metafunction  is  responsible  for  the  cre¬ 
ation  and  presentation  of  text  in  context,  i.e.,  for  estab- 
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lishing  textual  cohesion,  thematic  development,  rhetori¬ 
cal  organization,  information  salience,  etc.  The  removal 
of  this  kind  of  information  allows  upper  model  specifi¬ 
cations  to  be  invariant  with  respect  to  their  particular 
occc^ions  of  use  in  texts  and  the  adoption  of  textually 
motivated  perspectives,  such  as,  e.g.,  theme/rheme  se¬ 
lections,  definiteness,  anaphora,  etc.  Thus,  with  the 
same  input  specification  as  above,  the  following  varia¬ 
tions  are  supported  by  varying  the  textual  constraints: 
It  was  the  electricity  being  discharged  that  resulted  in 
the  system  breaking  down. 

The  discharge  of  electricity  resulted  in  the  system  break¬ 
ing  down. 

The  system  breaking  down  —  the  electricity  being  dis¬ 
charged  did  it! 
etc. 

These  textual  variations  are  controlled  during  the 
construction  of  text  (cf.  [Matthiessen,  1987;  Dale,  1989; 
Hovy  and  McCoy,  1989;  Meteer,  1989;  Bateman  and 
Matthiessen,  1990])  and,  again,  are  factored  out  of  the 
upper  model. 

The  interpersonal  metafunciion  is  responsible  for  the 
speaker’s  interaction  with  the  listener,  for  the  speech 
act  type  of  an  utterance,  the  force  with  which  it  is  ex¬ 
pressed,  etc.  Thus,  again  with  the  same  input  specifi¬ 
cation,  the  following  variants  are  possible: 

Did  electricity  being  discharged  result  in  the  system 
breaking  down? 

Electricity  being  discharged  resulted  surprisingly  in  the 
whole  damn  thing  breaking  down. 

1  rather  suspect  that  electricity  being  discharged  may 

have  resulted  in  the  system  breaking  down. 

etc. 

The  metafunctional  factorization  thus  permits  the 
upper  model  to  specify  experiential  meanings  that  are 
invariant  with  respect  to  the  linguistic  alternations 
driven  by  the  other  metafunctions.  That  is,  a  speci¬ 
fication  in  upper  model  terms  is  consistent  with  a  set  of 
linguistic  realizations  that  may  be  regarded  as  ‘experi¬ 
ential  paraphrases’:  the  specification  expresses  the  ‘se¬ 
mantic’  content  that  is  shared  across  those  paraphrases 
and  often  provides  just  the  level  of  Unguistically  de¬ 
committed  representation  required  for  nonlinguistically 
oriented  appUcations.  Generation  of  any  unique  sur¬ 
face  realization  is  achieved  by  additionally  respecting 
the  functional  constraints  that  the  other  metafunctions 
bring  to  bear;  particular  surface  forms  are  only  specifi¬ 
able  when  a  complete  set  of  constraints  from  each  of  the 
four  metafunctions  are  combined.  The  application  of 
these  constraints  is  directly  represented  in  the  penman 
grairunar,  which  provides  for  the  perspicuous  and  mod¬ 
ular  integration  of  many  disparate  sources  of  informa¬ 
tion.  The  interdependencies  between  these  constraints 
and  their  conditions  of  applicability  are  also  directly 
represented  in  the  grammar.  This  organization  of  the 
grammar  allows  us  to  construct  a  rather  abstract  up¬ 
per  model  while  still  preserving  the  necessary  mapping 
to  Unguistic  form.  The  value  of  achieving  the  abstract 
specification  of  meaning  supported  by  the  upper  model 


is  then  that  it  permits  a  genuinely  form-independent, 
but  nevertheless  form-constraining,  ‘conceptual’  repre¬ 
sentation  that  can  be  used  both  as  a  statement  of  the 
semantic  contents  of  an  utterance  and  as  an  abstract 
specification  of  content  for  application  domains  that  re¬ 
quire  linguistic  output. 

Summary  and  Conclusions 

A  computational  resource  has  been  developed  within 
the  PENMAN  text  generation  project  that  significantly 
simpUfies  control  of  a  text  generator.  This  resource, 
called  the  upper  model,  is  a  hierarchy  of  concepts  that 
captures  semantic  distinctions  necessary  for  generating 
natural  language.  Although  similar  levels  of  abstract 
semantic  organization  are  now  being  sought  in  many 
natural  language  systems,  they  are  often  built  anew  for 
each  project,  are  to  an  unnecessary  extent  domain  or 
theory  specific,  are  required  to  fulfill  an  ill-determined 
set  of  functionaUties,  and  lack  criteria  for  their  design. 
This  paper  has  presented  the  results  of  our  experiences 
in  designing  and  using  the  upper  model  in  a  variety  of 
applications;  in  particular,  it  presented  our  conclusions 
concerning  the  appropriate  source  of  constraints  con¬ 
cerning  the  organization  of  an  upper  model.  We  have 
found  that  restricting  the  information  contained  in  an 
upper  model  to  experiential  meaning  hsis  significantly 
improved  our  understanding  of  how  a  semantic  hier¬ 
archy  should  be  organized  and  how  it  needs  to  relate 
to  the  rest  of  the  Unguistic  system.  We  strongly  feel, 
therefore,  that  subsequently  constructed  semantic  or¬ 
ganizations  should  follow  the  guideUnes  set  out  by  the 
metafunctional  hypothesis;  the  factorization  that  it  pro¬ 
vides  concerning  what  should,  and  should  not,  be  rep¬ 
resented  in  an  ‘abstract  semantic  knowledge’  hierarchy 
supports  functionaUties  weU  beyond  those  envisioned  in 
current  text  generation/understanding  systems. 

Acknowledgments 

The  upper  model  has  been  under  development  for  several 
years,  and  many  have  and  continue  to  contribute  to  it.  The 
ideas  I  have  reported  on  here  would  not  have  been  possi¬ 
ble  without  that  development.  Those  responsible  for  the 
present  form  of  the  upper  model  include:  William  Mann, 
Christian  Matthiessen,  Robert  Kasper,  Richard  Whitney, 
Johanna  Moore,  Eduard  Hovy,  Yigal  Arens,  and  myself 
Thanks  also  to  Cecile  Paris  and  Eduard  Hovy  for  improving 
the  paper’s  organization.  Financial  support  was  provided  by 
AFOSR  contract  F49620-87-C-0005,  and  in  part  by  DARPA 
contract  MDA903-87-C-641.  The  opinions  in  this  report  ate 
solely  those  of  the  author. 

References 

[Allgayer  et  al.,  1989]  Jurgen  AUgayer,  Karin  Har- 
busch,  Alfred  Kobsa,  Carola  Reddig,  Norbert  Rei- 
tbinger,  and  Dagmar  Schmauks.  Xtra:  a  natural- 
language  access  system  to  expert  systems.  Inter- 


59 


national  Journal  of  Man-Machine  Communication, 
1989. 

[Bateman  and  Matthiessen,  1990]  John  A.  Bateman 
and  Christian  M.I.M.  Matthiessen.  Uncovering  the 
text  base.  In  Hermann  Bluhme  and  Hao  Keqi,  edi¬ 
tors,  Selected  Papers  from  the  International  Confer¬ 
ence  on  Research  in  Text  and  Language,  Xi’an  Jiao- 
tong  University,  Xi’an,  P.R.  China,  29-31  March 
1989.  1990. 

[Bateman  and  Paris,  1989]  John  A.  Bateman  and 
Cecile  L.  Paris.  Phreising  a  text  in  terms  the  user 
can  understand.  In  Proceedings  of  the  Eleventh  Inter¬ 
national  Joint  Conference  on  Artificial  Intelligence, 
Detroit,  Michigan,  1989.  IJCAI-89. 

[Bateman  et  al.,  1990]  John  A.  Bateman,  Robert  T. 
Kasper,  Johanna  D.  Moore,  and  Richard  A.  Whitney. 
A  general  organization  of  knowledge  for  natural  lan¬ 
guage  processing:  the  penman  upper  model.  Techni¬ 
cal  report,  USC/Information  Sciences  Institute,  Ma¬ 
rina  del  Rey,  California,  1990. 

[Bateman,  1988]  John  A.  Bateman.  Aspects  of  clause 
politeness  in  Japanese:  an  extended  inquiry  seman¬ 
tics  treatment.  In  Proceedings  of  the  26th  Inter¬ 
national  Conference  on  Computational  Linguistics, 
pages  147-154,  Buffalo,  New  York,  1988.  Association 
for  Computational  Linguistics.  Also  available  as  ISI 
Reprint  Series  report  RS-88-211,  USC/Information 
Sciences  Institute,  Marina  del  Rey,  California. 

[Bateman,  1989]  John  A.  Bateman.  Upper  modelling 
for  machine  translation:  a  level  of  abstraction  for 
preserving  meaning.  Technical  Report  Eurotra- 
D  Working  Papers,  No.  12,  Institut  fur  Angewandte 
Informationsforschung,  Saarbriicken,  West  Germany, 
1989. 

[Bateman,  1990]  John  A.  Bateman.  Upper  modeling: 
current  states  of  theory  and  practise,  1990.  PENMAN 
Development  Note,  USC/Information  Sciences  Insti¬ 
tute. 

[Chen  and  Cha,  1988]  Keh-Jiann  Chen  and  Chuan-Shu 
Cha.  The  design  of  a  conceptual  structure  and  its  re¬ 
lation  to  the  parsing  of  Chinese  sentences.  In  Proceed¬ 
ings  of  the  1988  International  Conference  on  Com¬ 
puter  Processing  of  Chinese  and  Oriental  Languages, 
Toronto,  Canada,  August  29  -  September  1  1988. 

[Dahlgren  et  ah,  1989]  Kathleen  Dahlgren,  Joyce  Mc¬ 
Dowell,  and  Edward  P.  Stabler.  Knowledge  represen¬ 
tation  for  commonsense  reasoning  with  text.  Com¬ 
putational  Linguistics,  15(3):149-170,  1989. 

[Dale,  1989]  Robert  Dale.  Cooking  up  referring  expres¬ 
sions.  In  Proceedings  of  the  Twenty-Seventh  Annual 
Meeting  of  the  Association  for  Computational  Lin¬ 
guistics,  Vancouver,  British  Columbia,  June  1989. 
Association  for  Computational  Linguistics. 


[Emele  et  al.,  1990]  Martin  Emele,  Ulrich  Heid,  Walter 
Kehl,  Stefan  Momma,  and  Remi  Zajac.  Organizing 
linguistic  knowledge  for  multilingual  generation.  In 
COLING-90,  1990.  Project  Polygloss  Paper,  Univer¬ 
sity  of  Stuttgart,  West  Germany. 

[Halliday  and  Matthiessen,  forthcoming]  Michael  A.K. 
Halliday  and  Christian  M.I.M.  Matthiessen.  The 
Bloomington  Lattice.  Technical  Report  in  prepara¬ 
tion,  University  of  Sydney,  Linguistics  Department, 
Sydney,  Aystraha,  forthcoming. 

[Halliday,  1978]  Michael  A.  K.  Halliday.  Language  as 
social  semiotic.  Edward  Arnold,  London,  1978. 

[Hovy  and  McCoy,  1989]  Eduard  H.  Hovy 

and  Kathy  F.  McCoy.  Focusing  your  RST:  A  step 
towards  generating  coherent  multisentential  text.  In 
Proceedings  of  the  11th.  Annual  Conference  of  the 
Cognitive  Science  Society,  pages  p667-674.  Univer¬ 
sity  of  Michigan,  Ann  Arbor,  Michigan,  August  16- 
19  1989.  Hillsdale,  New  Jersey:  Lawrence  Erlbaum 
Associates.  ^ 

[Hovy  et  al.,  1988]  Eduard  H.  Hovy,  Douglas  Appelt, 
and  David  D.  McDonald.  Workshop  on  text  plan¬ 
ning  and  natural  language  generation,  August  1988. 
Sponsored  by  American  Association  for  Artificial  In¬ 
telligence. 

[lordanskaja  et  al.,  1988]  Lidija  lordanskaja,  Richard 
Kittredge,  and  Polguere  Alain.  Lexical  selection 
and  paraphrase  in  a  meaning-text  generation  model, 
July  1988.  Presented  at  the  Fourth  International 
Workshop  on  Natural  Language  Generation.  Also  ap¬ 
pears  in  selected  papers  from  the  workshop:  Paris, 
Swartout  and  Mann  (eds.)(1990)(op.  cit.). 

[Jackendoif,  1983]  Ray  Jackendoif.  Semantics  and 
Cognition.  MIT  Press,  Cambridge,  MA,  1983. 

[Kasper,  1988]  Robert  T.  Keisper.  An  Experimental 
Parser  for  Systemic  Grammars.  In  Proceedings  of 
the  12th  International  Conference  on  Computational 
Linguistics,  August  1988,  Budapest,  Hungary,  1988. 
Association  for  Computational  Linguistics.  Also 
available  as  Information  Sciences  Institute  Technical 
Report  No.  ISI/RS-88-212,  Marina  del  Rey,  CA. 

[Kasper,  1989]  Robert  T.  Kasper.  Unification  and  clas¬ 
sification:  an  experiment  in  information-based  pars¬ 
ing.  In  Proceedings  of  the  International  Workshop 
on  Parsing  Technologies,  pages  1-7,  1989.  28-31  Au¬ 
gust,  1989,  Carnegie-Mellon  University,  Pittsburgh, 
Pennsylvania. 

[Langacker,  1987]  Ronald  W.  Langacker.  Foundations 
in  Cognitive  Grammar.  Stanford  University  Press, 
Stanford,  California,  1987. 

[MacGregor  and  Bates,  1987]  Robert  MacGregor  and 
Raymond  Bates.  The  loom  knowledge  represen¬ 
tation  language.  In  Proceedings  of  the  Knowledge- 
Based  Systems  Workshop,  1987.  Held  in  St.  Louis, 


60 


Missouri,  April  21-23,  1987.  Also  available  as  ISI 
reprint  series  report,  RS-87-188,  USC/Information 
Sciences  Institute,  Marina  del  Rey,  CA. 

[Mann  and  Matthiessen,  1985]  William  C.  Mann  and 
Christian  M.I.M.  Matthiessen.  Demonstration  of  the 
nigel  text  generation  computer  program.  In  J.  Ben¬ 
son  and  W.  Greaves,  editors.  Systemic  Perspectives 
on  Discourse,  Volume  1.  Ablex,  Norwood,  New  Jer¬ 
sey,  1985. 

[Mann,  1983]  Wilham  C.  Mann.  The  anatomy  of  a 
systemic  choice.  Discourse  Processes,  1983.  Also 
available  as  USC/Information  Sciences  Institute,  Re¬ 
search  Report  ISI/RR-82-104,  1982. 

[Mann,  1985]  Wilham  C.  Mann.  Janus  abstraction 
structure  -  draft  1,  1985.  An  informal  project  tech¬ 
nical  memo  of  the  Janus  project  at  ISI. 

[Matthiessen,  1987] 

Christian  M.I.M.  Matthiessen.  Notes  on  the  organi¬ 
zation  of  the  environment  of  a  text  generation  gram¬ 
mar.  In  G.  Kempen,  editor,  Natural  Language  Gen¬ 
eration:  Recent  Advances  in  Artificial  Intelligence, 
Psychology,  and  Linguistics.  Kluwer  Academic  Pub¬ 
lishers,  Boston/Dordrecht,  1987.  Paper  presented 
at  the  Third  International  Workshop  on  Natural 
Language  Generation,  August  1986,  Nijmegen,  The 
Netherlands. 

[Matthiessen,  1988]  Christian  M.I.M.  Matthiessen.  Se¬ 
mantics  for  a  systemic  grammar:  the  chooser  and 
inquiry  framework.  In  James  Benson,  Michael  Cum¬ 
mings,  and  William  Greaves,  editors.  Linguistics  in  a 
Systemic  Perspective.  Benjamins,  Amsterdam,  1988. 
Also  available  as  USC/Information  Sciences  Insti¬ 
tute,  Reprint  Series  Report  ISI/RS-87-189,  Marina 
del  Rey,  CA. 

[McKeown  and  Paris,  1987]  Kathleen  R. 

McKeown  and  Cecile  L.  Paris.  Functional  unifica¬ 
tion  grammar  revisited.  In  Proceedings  of  the  25th 
Annual  Meeting  of  the  ACL,  Palo  Alto,  California, 
1987.  Association  of  Computational  Linguistics. 

[Mel’cuk  and  Zholkovskij,  1970]  A.  Mel’cuk,  Igor  and 
A.K.  Zholkovskij.  Towards  a  functioning  “meaning- 
text”  model  of  language.  Linguistics,  57:10-47, 1970. 

[Meteer  et  al.,  1987]  Marie  W.  Meteer,  David  D.  Mc¬ 
Donald,  S.D.  Anderson,  D.  Forster,  L.S.  Gay,  A.K. 
Huettner,  and  P.  Sibun.  Mumble-86:  Design  and 
implementation.  Technical  Report  87-87,  COINS, 
University  of  Massachusetts,  1987. 

[Meteer,  1988]  Marie  W.  Meteer.  Defining  a  vocabu¬ 
lary  for  text  planning,  August  1988.  Presented  at 
the  AAAI-88  Workshop  on  Text  Planning  and  Real¬ 
ization,  organized  by  Eduard  H.  Hovy,  Doug  Appelt, 
David  McDonald  and  Sheryl  Young. 

[Meteer,  1989]  W  Meteer,  Marie.  The  SPOKESMAN  nat¬ 
ural  language  generation  system.  Technical  Report 


BBN  Report  No.  7090,  BBN  Systems  and  Technolo¬ 
gies  Corporation,  Cambridge,  MA.,  1989. 

[Momma  and  Dorre,  1987]  Stefan  Momma  and  Jochen 
Dorre.  Generation  from  f-structures.  In  Ewan  Klein 
and  Johann  Van  Bentham,  editors.  Categories,  Poly¬ 
morphism  and  Unification.  Cognitive  Science  Centre, 
University  of  Edinburgh,  Edinburgh,  Scotland,  1987. 

[Moore  and  Arens,  1985]  Johanna  D.  Moore  and  Yi- 
gal  Arens.  A  hierarchy  for  entities,  1985. 
USC/Information  Sciences  Institute,  Internal  Draft. 

[Nirenberg  et  al.,  1987]  Sergei  Nirenberg,  V.  Raskin, 
and  A.  Tucker.  The  structure  of  interlingua  in 
TRANSLATOR.  In  Sergei  Nirenberg,  editor.  Machine 
Dranslation:  Theoretical  and  Methodological  Issues. 
Cambridge  University  Press,  Cambridge,  1987. 

[Paris  et  al.,  1990]  Cecile  L. 

Paris,  William  R.  Swartout,  and  William  C.  Mann, 
editors.  Natural  Language  Generation  in  Artificial 
Intelligence  and  Computational  Linguistics.  Kluwer 
Academic  Publishers,  1990. 

[Steiner  et  al.,  1987]  Erich  H.  Steiner,  Ursula  Eckert, 
Birgit  Week,  and  Jutta  Winter.  The  development  of 
the  EUROTRA-D  system  of  semantic  relations.  Tech¬ 
nical  Report  Eurotra-D  Working  Papers,  No.  2,  In- 
stitut  der  angewandten  Informationsforschung,  Uni- 
versitat  des  Saarlandes,  Sciarbriicken,  West  Germany, 
1987. 

[Steiner,  1990]  Erich  H.  Steiner.  A  model  of  goal- 
directed- action  as  a  structuring  principle  for  the  con¬ 
text  of  situation  in  systemic  linguistics.  Mouton  and 
de  Gruyter,  Berlin,  1990. 

[Talmy,  1987]  Leonard  Talmy.  The  relation  of  gram¬ 
mar  to  cognition.  In  B.  Rudzka-Ostyn,  editor.  Topics 
in  Cognitive  Linguistics.  John  Benjamins,  1987. 

[The  Penman  Project,  1989]  The  Penman 

Project.  The  penman  documentation:  User  guide, 
primer,  reference  manual,  and  nigel  manual.  Techni¬ 
cal  report,  USC/Information  Sciences  Institute,  Ma¬ 
rina  del  Rey,  CA,  1989. 

[Weischedel,  1989]  Ralph  M.  Weischedel.  A  hybrid  ap¬ 
proach  to  representation  in  the  janus  natural  lan¬ 
guage  processor.  In  27th  Annual  Meeting  of  the  As¬ 
sociation  for  Computational  Linguistics,  pages  193- 
202,  Vancouver,  British  Columbia,  1989. 


61 


